Chinese character handwriting recognition system

ABSTRACT

A handwritten Chinese character input method and system is provided to allow users to enter Chinese characters to a data processor by adding less than three strokes and one selection movement such as mouse clicking or stylus or finger tapping. The system is interactive, predictive, and intuitive to use. By adding one or two strokes which are used to start writing a Chinese character, or in some case even no strokes are needed, users can find a desired character from a list of characters. The list is context sensitive. It varies depending on the prior character entered. Compared to other existing systems, this system can save users considerable time and efforts to entering handwritten characters.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.10/205,950, filed Jul. 25, 2002.

BACKGROUND OF THE INVENTION

1. Technical Field

This invention relates generally to text input technology. Moreparticularly, the invention relates to a method and system that allowsusers to input handwritten Chinese characters to a data processor byentering the first few strokes required to write a character, so thatusers can perform characters input tasks in a fast, predictive way.

2. Description of the Prior Art

Around the globe, over 1.2 billion people speak Chinese. This includesthe People's Republic of China, Taiwan, Singapore, and a large communityof overseas Chinese in Asia and North America. Chinese character strokesand symbols are so different and so complicated that they can be sortedand grouped in a wide variety of ways. One can analytically sort out asmany as 35-40 strokes of 4-10 symbols or more per Chinese character,depending on how they are grouped. Because of this unique structure ofChinese language, computer users cannot input Chinese characters usingalphabetic keyboards as easily as inputting Western language.

A number of methods and systems for inputting Chinese characters toscreen, such as the Three Corners method, Goo Coding System, 5-Strokemethod, Changjie's Input scheme, etc., have been developed. However,none of these input methods provides an easy to use, standardizedinput/output scheme to speed up the retrieval, typewriting process, bytaking full advantage of computer technology.

Several other methods and system for inputting handwritten Chinesecharacters are also deknown. For example, Apple Computer and theInstitute of System Science in Singapore (Apple-ISS) have developed asystem which features an application for dictation and a handwritinginput method for Chinese. This system incorporates a dictionaryassistance service wherein when a first character is recognized, thedevice displays a list of phrases based on the first character and theuser may select the proper phrase without inputting any stroke. Thistechnique effectively increases the input speed.

Another example is Synaptics' QuickStroke system which incorporates aprediction function based on a highly sophisticated neural networkengine. This is not a graphics capture application where the users haveto write out the entire character before the software can recognizewhich character is intended. Instead, it can recognize a character afteronly three to six strokes of the character have been written. It can beused with a standard mouse, Synaptics TouchPad™, or a Synaptics peninput TouchPad.

Another example is Zi Corporation's text input solutions based on anintelligent indexing engine which intuitively predicts and displaysdesired candidates. The solutions also include powerful personalizationand learning capabilities—providing prediction of user-created terms andfrequently used vocabulary.

It would be advantageous to provide a handwritten Chinese characterinput method and system to allow users to enter Chinese characters to adata processor by drawing just the first few strokes and one selectionmovement such as mouse clicking or stylus or finger tapping.

SUMMARY OF INVENTION

A handwritten Chinese character input method and system is provided toallow users to enter Chinese characters to a data processor by drawingjust the first few strokes and one selection movement such as mouseclicking or stylus or finger tapping. The system is interactive,predictive, and intuitive to use. By adding one or two strokes which areused to start writing a Chinese character, users can find a desiredcharacter from a list of characters. The list is context sensitive, soin some cases no strokes are needed. It varies depending on the priorcharacter entered. The system puts the handwritten-stroke-to-categorymapping on top of the stroke category matching technology, including anoptional “Match any stroke category” key or gesture. Compared to otherexisting systems, this system can save users considerable time andefforts to entering handwritten characters.

In one preferred embodiment, the handwritten Chinese character inputsystem includes: (1) recognition means for recognizing a category ofhandwriting stroke from a list of stroke categories; (2) collectionmeans for organizing a list of characters that commonly start with oneor more recognized categories of handwriting strokes, the list ofcharacters being displayed in a predetermined sequence; and (3)selection means for selecting a desired character from the list ofcharacters.

In a typical embodiment, the strokes are classified into five basiccategories, each having one or more sub-categories. The collection meanscontains predefined stroke order information. It also contain a displaymeans to display a list of most frequently used characters when nostrokes are entered, while strokes are being entered, and/or after acharacter is selected. The list of most frequently used characters iscontext sensitive. It varies depending upon the last Chinese characterentered. The predetermined sequence may be based on any of: (1) numberof strokes necessary to write out a character; (2) use frequency of acharacter; and (3) contextual relation to the last character entered.

The selection means is associated with any of: (1) mouse clicking; (2)stylus tapping; (3) finger tapping; and (4) button/key pressing.

The system also contains “stroke entry means,” such as an LCDtouchscreen, stylus or finger pad, trackball, data glove, or othertouch-sensitive (possibly flexible) surface.

The system may further includes means for displaying a numeric or iconicrepresentation of each stroke that is entered and a full numeric oriconic representation of strokes for a Chinese character that isselected.

According to the preferred embodiment, a method for inputtinghandwritten Chinese characters includes the following steps:

-   -   adding a stroke into the stroke recognition apparatus;    -   categorizing the added stroke into one of a predetermined number        of categories;    -   finding characters based on frequency of character use;    -   displaying a list of found characters;    -   if a desired character is in the list, selecting the desired        character from the list;    -   if a desired character is not visible in the list, adding        another stroke;    -   finding most common characters that appear after a previously        selected character based on a present stroke sequence; and    -   displaying another list of found characters.

The method may further comprise the steps of:

-   -   displaying a numeric representation for a stroke that is added;        and    -   displaying full stroke numeric representation for a character        that is selected.

As an alternative, the method may comprises the steps of:

-   -   displaying an iconic representation for a stroke that is added;        and    -   displaying full stroke iconic representation for a character        that is selected.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram illustrating an apparatus for inputtinghandwritten Chinese characters according to one preferred embodiment ofthe invention;

FIG. 2 is a flow diagram illustrating a method for inputting handwrittenChinese characters in a predictive manner according to another preferredembodiment of the invention;

FIG. 3 is a diagram illustrating five basic strokes and their numericrepresentation;

FIG. 4A is a pictorial diagram illustrating an overview of the StrokeRecognition Interface prior to any input;

FIG. 4B is a pictorial diagram illustrating the Stroke RecognitionInterface when a first single horizontal stroke is added;

FIG. 4C is a pictorial diagram illustrating the Stroke RecognitionInterface when a second horizontal stroke is added;

FIG. 4D is a pictorial diagram illustrating the Stroke RecognitionInterface when a third horizontal stroke is added;

FIG. 4E is a pictorial diagram illustrating the Stroke RecognitionInterface when a desired character appears to be the first character inthe Selection List;

FIG. 4F is a pictorial diagram illustrating the Stroke RecognitionInterface when the first character in the selection list is selected;

FIG. 4G is a pictorial diagram illustrating the Stroke RecognitionInterface when a desired character is not the first character in theselection list;

FIG. 4H is a pictorial diagram illustrating the Stroke RecognitionInterface when the desired character rather than the first character inthe selection list is selected;

FIG. 4I is a pictorial diagram illustrating the Stroke RecognitionInterface when the first desired character is selected and a stroke isadded for another character;

FIG. 4J is a pictorial diagram illustrating the Stroke RecognitionInterface when two strokes are added;

FIG. 4K is a pictorial diagram illustrating the Stroke RecognitionInterface when third stroke is added;

FIG. 4L is a pictorial diagram illustrating the Stroke RecognitionInterface where the desired character is indicated;

FIG. 4M is a pictorial diagram illustrating the Stroke RecognitionInterface when the second desired character is selected;

FIG. 4N is a pictorial diagram illustrating the Stroke RecognitionInterface where a third desired character appears in the most frequentlyused characters;

FIG. 4O is a pictorial diagram illustrating the Stroke RecognitionInterface when a third desired character is selected without adding anystroke; and

FIG. 5 is a schematic diagram illustrating the input interface fortouchscreen PDA according to the most preferred embodiment of theinvention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a schematic diagram illustrating an apparatus for inputtinghandwritten Chinese characters according to one preferred embodiment ofthis invention. The apparatus includes three basic components: a StrokeRecognition Interface 20 for recognizing entered stroke patterns, anInput Device 24 for entering strokes, and a Processor 30 for performingdata process tasks.

The Stroke Recognition Interface 20 has three basic areas: a MessageDisplay Area 28, a Selection List Area 26, and a Stroke Input Area 22.

Message Display Area 28 is the place where the selected characters aredisplayed. It represents an email or SMS message, or whateverapplication intends to use the generated text.

Selection List Area 26 is the place to display the most common characterchoices for the strokes currently entered on the stroke input window.This area may also list common characters that follow the last characterin the Message Display Area 28, that also begin with the strokes enteredin the Stroke Input Area 22.

Stroke Input Area 22 is the heart of the Stroke Recognition Interface20. The user begins drawing a character onscreen in this area, using anInput Device 24 such as a stylus, a finger, or a mouse, depending oninput device and display device used. The display device echos andretains each stroke (an “ink trail”) until the character is selected.

Stroke Recognition Interface 20 may further includes a Stroke NumberDisplay Area to display the interfaces interpretation, either numeric oriconic, of the strokes entered by the user. When a character isselected, the full stroke representation, either by numbers or by icons,is displayed here. This area is optional, but could be useful forhelping users learn stroke orders and stroke categories.

The system may further include: the capabilities to match Latin lettersand punctuation symbols and emoticons, with user-defined strokesequences; user-defined gestures for predefined stroke categories, andunique gestures representing entire components/sequence/symbols;learning/adapting to user's handwriting style, skew, or cursive;optional training session with known characters; optional prompting userto clarify between ambiguous stroke interpretations, and/or a means toenter explicit strokes, e.g. via stroke category keys), and/or remedy astroke misinterpretation; optional indication of level of confidence ofstroke interpretations, e.g. color-coding each “ink trail” or asmiley-face that frowns when it is uncertain; means to display allstrokes that make up a character, e.g. drag & drop from text editor toStroke [Number] Display Area); as well as ability to delete the laststroke(s) in reverse order (and ink trail(s)) by some means.

FIG. 2 is a flow diagram illustrating a method for inputting handwrittenChinese characters in a predictive manner according to the preferredembodiment of the invention. The method includes the following steps:

-   Step 50: Adding a stroke into the Stroke Input Area 22;-   Step 52: Categorizing the added stroke into a stroke category.-   Step 54: Finding characters based on frequency of character use;-   Step 56: Displaying a list of found characters. The list of    characters is displayed in a predetermined sequence. The    predetermined sequence may be based on (1) number of strokes    necessary to write out a Chinese character; (2) use frequency of a    Chinese character entered; or (3) contextual relation to the prior    character entered;-   Step 58: Checking whether the desired character in the list;-   Step 60: If the desired character is not in the list, adding next    stroke in the Message Display Area 28;-   Step 70: If a desired character is in the list, selecting it by    clicking a mouse or tapping a stylus or finger, depending on the    input and display devices used;-   Step 72: Putting the selected character in the Message Display Area    28;-   Step 74: Checking whether the message is complete;-   Step 76: Adding next stroke if the message is not complete;-   Step 62 (continued from Step 60 or Step 76): Finding most common    characters that appear after a previously selected character based    on a present stroke sequence. This also happens before the first    stroke, i.e. before Step 50] and-   Step 80: Displaying a list of found characters and the process    continues on Step 58.

The apparatus may have a function to actively display the interfacesinterpretation, either numeric or iconic, of the strokes entered by theuser. Therefore, the method described above may further comprise thesteps of:

-   -   Displaying a numeric representation for a stroke that is added;    -   Displaying full stroke numeric representation for a character        that is selected;    -   Displaying an iconic representation for a stroke that is added;        and    -   Displaying full stroke iconic representation for a character        that is selected.

As an alternative, Step 54 may be replaced by:

-   -   Finding characters that commonly start with one or more        recognized stroke patterns.

FIG. 3 is a diagram showing five basic strokes and their numericrepresentation. There is a government standard of five stroke categoriesfor simplified Chinese characters. There are other classification of thestroke categories. The method and system according to this inventionapply to any kind of classification.

One of the major advantages of the recognition system according to thisinvention is the great reduction of ambiguities arising in the subtledistinction between certain subtypes of the stroke categories. To reduceambiguities, there are further definitions on the subtypes. For example,a horizontal line with a slight hook upwards is stroke 1; a horizontalline with a slight hook down is stroke 5; a horizontal line angledupwards is stroke 1; and a curved line that starts right diagonally thenevens out to horizontal or curved up is stroke 4, and etc.

One technique for resolving, or at least limiting, ambiguities, is theuse of limited wildcards. These are stroke keys that match with anystroke that fits one type of ambiguity. For example, if the stroke mayfit into either stroke category 4 or stroke category 5, the limitedwildcard would match both 4 and 5.

Often the difference between a stroke of one type and a similar strokeof another type are too subtle for a computer to differentiate. Thisgets even more confusing when the user is sloppy and curves his straightstrokes, or straightens his curved strokes, or gets the angle slightlyoff.

To account for all of the variation of an individual user, the systemmay learn the specific idiosyncrasies of its one user, and adapt to fitthat person's handwriting style.

The specifics of the exaggeration needed may be determined asappropriate. Key to this aspect of the invention is that the user has tomake diagonal strokes very diagonal, straight strokes very straight,curved strokes very curved, and angled strokes very angled.

The result on paper is a character that would look somewhat artificialand a caricature of its intended character. However, this greatlysimplifies the disambiguation process for finding the strokes, whichthen helps the disambiguation of characters.

In the following paragraphs in conjunction with a series of pictorialdiagrams, the operation process is described.

FIG. 4A illustrates an overview of the Stroke Recognition Interfacebefore any stroke is added. Note that the Character Selection List showsthe first ten most frequently used characters. If a user's first desiredcharacter is in the list, he just selects the character by clicking themouse or by tapping a stylus or his finger, without need to add astroke. If the desired character is not in the list, the user adds astroke using mouse, stylus, or finger.

FIG. 4B illustrates the Stroke Recognition Interface when a first singlehorizontal stroke is added. The stroke category is determined to be “1”,and is listed in the Stroke Number Area. The Selection List isre-ordered to predict the most likely character to be chosen based onthe first stroke.

FIG. 4C illustrates the Stroke Recognition Interface when a secondhorizontal stroke is added. After a second horizontal line is entered,the selection list is re-ordered again, showing only the most likelycharacters that start with two horizontal lines (stroke category 1).Note that the position and relative lengths of the strokes do not affectthe selection list, only the stroke categories.

FIG. 4D illustrates the Stroke Recognition Interface when a thirdhorizontal stroke is added. After a third horizontal line is entered,the selection list is re-ordered again, showing only the most likelycharacters that start with three horizontal lines (stroke category 1).

FIG. 4E illustrates the Stroke Recognition Interface when a desiredcharacter appears to be the first character in the Selection List. Notethat the character drawn so far is identical to the first characterlisted in the selection list. If this were the character desired, simplyclick that character from the list.

FIG. 4F illustrates the Stroke Recognition Interface when the firstcharacter in the selection list is selected. If the user chooses thefirst character, it is added to the message; at the same time, thestroke numbers are displayed at the bottom, and the input area iscleared, ready for the next character. Note that to select a character,the user has to take one additional mouse click (or stylus or fingerpress/tapping) than there are strokes. Novice users may find thisannoying until they get used to the system, and lean to take advantageof its predictive features.

FIG. 4G illustrates the Stroke Recognition Interface when a desiredcharacter is not the first character in the selection list. The strengthof this system is its predictive abilities. If the user desired the verycomplex, but somewhat common, character pointed to in the aboveillustration, he needs not complete the stroke for that character. Assoon as it is displayed in the selection list, it can be selected byclicking a mouse (or stylus or finger tapping) on the character.

FIG. 4H illustrates the Stroke Recognition Interface when the desiredcharacter rather than the first character in the selection list isselected. Once the complex character is selected, we see that it is a15-stroke character, added to the message with only three strokes andone additional click. The user gets a 15-stroke character using fourmovements. The saving of movement and hence time is about four to one.Additionally, the entire stroke order is displayed now, so if the userwas used to an alternate stroke order for the character, he can learnthe Government Standard stroke order used by this system.

FIG. 4I illustrates the Stroke Recognition Interface when the firstdesired character is selected and a stroke is added for anothercharacter. Once the character is entered, the program is ready to acceptthe strokes for another character. Here the initial stroke is adifferent category, to enter in a very different character. Notice thatthe selection list is very different than it was with the first strokeof the previous character.

FIG. 4J illustrates the Stroke Recognition Interface when two strokesare added. Note that the strokes entered already form a character thatmatches the most likely choice in the selection list. The character thatwe are aiming for in this example is already displayed (see the fifthcharacter from the left) after the second stroke is added. But we wantto continue to demonstrate the disambiguation feature of the system.

FIG. 4K illustrates the Stroke Recognition Interface when the thirdstroke is added. After a third stroke is entered, the selection listcontains two characters that are only slightly different from eachother. In fact, these two characters have exactly the same stroke order,and choosing from the selection list is the only way to disambiguate thetwo characters. Note that the second character being pointed to one isless commonly used than not only the first, but also of a slightly morecomplex character.

FIG. 4L illustrates the Stroke Recognition Interface where the desiredcharacter is indicated. Note that the desired character was firstvisible after the second stroke was entered, and is still a likelychoice in the selection list (see the fourth character from the left).If a desired character is removed from the selection list for somereason, it is indication that the stroke order entered by the user doesnot match the Government Standard stroke order used in the system.

FIG. 4M illustrates the Stroke Recognition Interface when the seconddesired character is selected. The character is selected, and added tothe message. It is a 9-stroke character. We selected it at threestrokes, but could have selected it at two strokes.

FIG. 4N illustrates the Stroke Recognition Interface where a thirddesired character appears in the most frequently used characters. Forvery common characters, there is no need to enter any strokes. The tenmost frequently used characters are displayed even when no strokes areentered. If the user wants to enter one of these common characters,simply selecting it will add it to the message. Note that the selectionlist of the most frequently used characters is context sensitive. Thesystem displays the ten most frequent characters to follow the lastcharacter entered.

FIG. 4O illustrates the Stroke Recognition Interface when a thirddesired character is selected without adding any stroke. This is asaving of seven to one for the third character.

FIG. 5 illustrates a recommended layout of the input interface accordingto the most preferred embodiment, where the message area is omitted andthe text goes directly into the active application, so there is no needfor a message area.

In a typical embodiment, the stroke entry means is a handwriting inputarea displayed on a touchscreen on a PDA. Each entered stroke isrecognized as one of a set of stroke categories. The graphical keys,each assigned to a stroke category, are optionally available to displayand enter strokes, as an alternative input means. One of the graphicalkeys represents “match any stroke category”.

The method described above may be carried out by a computer usablemedium containing instructions in computer readable form. In otherwords, the method may be incorporated in a computer program, a logicdevice, mobile device, or firmware and/or may be downloaded from anetwork, e.g. a Web site over the Internet. It may be applied in allsorts of text entry.

Although the invention is described herein with reference to somepreferred embodiments, one skilled in the art will readily appreciatethat other applications may be substituted for those set forth hereinwithout departing from the spirit and scope of the present invention.

Accordingly, the invention should only be limited by the claims includedbelow.

1. A Chinese character handwriting input system, comprising: recognitionmeans for recognizing a category of handwriting stroke from a predefinednumber of stroke categories; recognition means for recognizing one ormore categories of handwriting stroke from a predefined number of strokecategories; collection means for organizing a list of characters thatcommonly start with said more than one recognized category ofhandwriting stroke, said list of characters being displayed in apredefined sequence, wherein said predefined sequence is based on anyof: number of strokes necessary to write out a character; use frequencyof a character; and contextual relation to the last character entered;and selection means for selecting a desired character from said list ofcharacters.
 2. The system of claim 1, further comprising: wildcard entrymeans for matching any stroke category.
 3. A Chinese characterhandwriting input system, comprising: recognition means for recognizinga category of handwriting stroke from a predefined number of strokecategories; and collection means for organizing a list of charactersthat commonly start with one or more recognized categories ofhandwriting stroke, said list of characters being displayed in apredefined sequence, wherein said predefined sequence is based on anyof: number of strokes necessary to write out a character; use frequencyof a character; and contextual relation to the last character entered;selection means for selecting a desired character from said list ofcharacters; wherein said predetermined number of stroke categoriescomprise more than five basic categories.
 4. A method for inputtinghandwritten Chinese characters, comprising the steps of: adding a strokeinto a pattern recognition system; categorizing said added stroke intoone of a predefined number of stroke categories; finding charactersbased on frequency of character use; and displaying a list of foundcharacters.
 5. The method of claim 4, further comprising the steps of:if a desired character is in said list, selecting said desired characterfrom said list; if a desired character is not visible in said list,adding another stroke; and displaying another list of found characters.6. The method claim 4, further comprising the steps of: displaying anumeric or iconic representation for a stroke that is added; anddisplaying full stroke numeric or iconic representation for a characterthat is selected.
 7. The method of claim 4, further comprising the stepsof: if a desired character is in said list, either of selecting saiddesired character from said list or adding another stroke and displayinganother list of found characters.
 8. The method of claim 4, furthercomprising the step of: retaining an ink trail of each stroke that isadded until a character is selected.
 9. The method of claim 8, furthercomprising the step of: color coding each ink trail either to indicate alevel of confidence or differentiation in said categorization step. 10.The method of claim 4, further comprising the step of: prompting a userto clarify between ambiguous stroke interpretations and/or to remedy astroke's misinterpretation.
 11. The method of claim 4, furthercomprising the step of: providing means for removing one or more strokesof an input stroke sequence in reverse order.
 12. The method of claim 4,further comprising the step of: providing means for matching any ofLatin letters, punctuation symbols, and emoticons with predefined oruser-defined stroke sequences.
 13. The method of claim 4, furthercomprising the step of: selecting a character from said list with a usergesture; wherein said user gesture allows said user to begin entry ofstrokes for a next character.
 14. The method of claim 4, furthercomprising the step of: providing user-defined gestures for any ofstroke categories, sequences of strokes, and character components. 15.The method of claim 4, further comprising the step of: providing meansfor explicit selection of stroke categories.
 16. The method of claim 4,further comprising the step of: displaying character components thatstart with one or more recognized stroke categories; wherein selecting acharacter component results in the display of only the characterscontaining or starting with said selected component.
 17. The method ofclaim 4, further comprising the step of: allowing alternative strokesequences for character or character component entry.
 18. The method ofclaim 4, said step of finding characters based on frequency of usefurther comprising the step of: finding said characters based oncontext.